FSM and k-nearest-neighbor for corpus based video-realistic audio-visual synthesis

نویسنده

  • Christian Weiss
چکیده

In this paper we introduce a corpus based 2D videorealistic audio-visual synthesis system. The system combines a concatenative Text-to-Speech (TTS) System with a concatenative Text-to-Visual (TTV) System to an audio lipmovement synchronized Text-to-Audio-Visual-Speech System (TTAVS). For the concatenative TTS we are using a Finite State Machine approach to select non-uniform variablesize audio segments. Analogue to the TTS a k-NearestNeighbor algorithm is applied to select the visual segments where we perform image filtering previous to the selection process to extract features which are used for the Euclidian distance measure to minimize distortions while concatenating the visual segments. We consider only the particular startframe and end-frame between potential video-frame sequences for the Euclidian metric. The selection of the visual equivalence of the selected segments is based on a visemic transcription according to the phonemic transcription of the given input text. Due to using independent source databases for speech and video we synchronize the generated signals in a linear way. The resulting audio-visual utterance is audio lipmovement synchronized audio-visual speech. The system is adaptable easily to new speakers whether using a different speech or video source.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Data-driven Video-realistic Audio-visual Speech-synthesis

In this work, we present a framework for generating a video-realistic audio-visual “Talking Head”, which can be integrated in applications as a natural Human-Computer interface where audio only is not an appropriate output channel especially in noisy environments. Our work is based on a 2D-video-frame concatenative visual synthesis and a unit-selection based Text -to-Speech system. In order to ...

متن کامل

FUZZY K-NEAREST NEIGHBOR METHOD TO CLASSIFY DATA IN A CLOSED AREA

Clustering of objects is an important area of research and application in variety of fields. In this paper we present a good technique for data clustering and application of this Technique for data clustering in a closed area. We compare this method with K-nearest neighbor and K-means.  

متن کامل

Phoneme - Viseme Mapping for German Video - Realistic Audio - Visual - Speech - Synthesis IKP - Working Paper NF 11

In this working paper we introduce a German viseme set which we already use in our data-driven audio-visual synthesis system. The viseme set is essential for speech driven audio-visual synthesis due to the fact that the selection of appropriate video segments is based on the visemically transcribed input text. For text-to-speech synthesis, a transcription of the input text into the phonemic rep...

متن کامل

Crim’s Content-based Copy Detection System for Trecvid

Approach we have tested in our submitted runs: For visualbased copy detection, we find links between video shot key-frames using a probabilistic latent space model over local matches between the keyframe images. This facilitates the extraction of significant groups of local matching descriptors that may represent common semantic elements of near duplicate key-frames. For 2009, we have worked on...

متن کامل

Identification of selected monogeneans using image processing, artificial neural network and K-nearest neighbor

Abstract Over the last two decades, improvements in developing computational tools made significant contributions to the classification of biological specimens` images to their correspondence species. These days, identification of biological species is much easier for taxonomist and even non-taxonomists due to the development of automated computer techniques and systems.  In this study, we d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005